Train window: 2000-01-01 → 2022-12-31
Test window: 2023-01-01 → 2024-12-31
Dropped 45 rows due to NaNs after all feature engineering.
Final data shape after engineering & dropna: 9087 rows, 273 columns
Train rows: 8361
Test rows: 726
Comprehensive Weather Forecast Model Comparison: Chattanooga T2M
An XGBoost Approach to Multi-Day Temperature Forecasting
1. Introduction and Model Objectives
This document presents a comparative analysis of different XGBoost modeling strategies for forecasting daily mean temperature (T2M) in Chattanooga, TN, for 1 to 5 days ahead. Our primary objective is to evaluate the impact of various feature engineering techniques and model architectures on forecast accuracy.
We will compare four distinct models:
- Bare Bones Baseline: A minimalist model using only T2M lags and basic time features.
- Full Features (Non-Cascading): Utilizes all engineered features (derived, intra-city lags, cross-city lags, rolling windows) but trains independent models for each forecast horizon.
- Cascading Forecasts: Employs the full feature set and a sequential training strategy, where predictions from earlier horizons are fed as features to later horizons.
- Native Multi-Output: Uses the full feature set with XGBoost’s native multi-output tree strategy, aiming to learn interdependencies between all horizons in a single, unified model.
2. Common Data Loading and Feature Engineering Pipeline
This section defines the shared data acquisition and feature engineering steps that are applied consistently to all models for a fair, “apples-to-apples” comparison. This includes loading data for Chattanooga and all surrounding cities, calculating derived features, intra-city lags, cross-city lags, and rolling window features.
Data
This experiment uses data from the NASA Power data base accessed through the API. Surrounding cities data was also used: Scottsboro, Jasper, Winchester, Cleveland, Dalton, Athens, Fort Payne, Dayton, Ringgold
Show code
import plotly.express as px
import pandas as pd
# Combine train and test data for the context plot
context_plot_df = pd.concat([
train_df[['date', 't2m']].assign(dataset='Train Data'),
test_df[['date', 't2m']].assign(dataset='Test Data')
], ignore_index=True)
train_context_fig = px.line(
context_plot_df, x="date", y="t2m", color="dataset", # <-- This line was cut off before
title=f"Actual Mean Temperature for {target_city_name} (Train vs. Test Data)", # <-- Full title restored
template="plotly_white",
color_discrete_map={'Train Data': 'blue', 'Test Data': 'red'} # <-- This was missing
)
train_context_fig.update_layout(height=400, xaxis_title="Date", yaxis_title="T2M Mean Temp (°C)") # <-- This was missing
train_context_fig.show() # <-- This was missingModel evaluation —————
3. Model 1: Bare Bones Baseline (04.000)
This model serves as a minimalist benchmark. It uses only t2m (current day) and its lags (1:7 days), along with basic time-based features (date_ordinal, doy_sin, doy_cos) from Chattanooga. All other complex features are excluded. It’s trained using the MultiOutputRegressor strategy.
Show code
from xgboost import XGBRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, max_error
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# --- Define Bare Bones Model's specific feature set ---
bare_bones_feature_cols = ["date_ordinal", "doy_sin", "doy_cos", "t2m"]
for lag in range(1, 8): # t2m_lag_1 to t2m_lag_7
bare_bones_feature_cols.append(f"t2m_lag_{lag}")
# --- SECTION 4. Train XGBoost Model (Bare Bones) ---
# Prepare arrays for multi-output training
iX_train_bare = train_df[bare_bones_feature_cols]
iy_train_bare = train_df[target_cols_all_horizons] # All 5 targets
xgb_base_model_bare = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=0)
multi_output_model_bare = MultiOutputRegressor(estimator=xgb_base_model_bare, n_jobs=-1)
print(f"\n--- Training Bare Bones Model for targets: {target_cols_all_horizons} ---")
multi_output_model_bare.fit(iX_train_bare, iy_train_bare)
models_bare = {"multi_output_model": multi_output_model_bare}
# --- SECTION 5. Forecast & Evaluate (Bare Bones) ---
df_eval_bare = pd.DataFrame({"date": test_df["date"]})
X_test_bare = test_df[bare_bones_feature_cols]
y_pred_all_targets_bare = models_bare["multi_output_model"].predict(X_test_bare)
current_model_results = []
for i, target_col in enumerate(target_cols_all_horizons):
df_eval_bare[f"y_true_{target_col}"] = test_df[target_col]
df_eval_bare[f"y_pred_{target_col}"] = y_pred_all_targets_bare[:, i]
y_true = df_eval_bare[f"y_true_{target_col}"]
y_pred = df_eval_bare[f"y_pred_{target_col}"]
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
max_err = max_error(y_true, y_pred)
current_model_results.append({
'Model': 'Bare Bones',
'Horizon': target_col.replace('t2m_', '').replace('_later', ' day(s)'),
'RMSE': rmse,
'MAE': mae,
'R2': r2,
'Max_Error': max_err
})
all_model_comparison_results.extend(current_model_results)
print("\n--- Metrics for Bare Bones Model (Test Set) ---")
display(pd.DataFrame(current_model_results))
# --- SECTION 6.0 Plot Results (Bare Bones) ---
# MODIFIED: Remove train data
plot_df_list_bare = []
# MODIFIED: Source Actual (Today's Mean Temp) from df_eval_bare for test period only
plot_df_list_bare.append(df_eval_bare[["date", f"y_true_{target_cols_all_horizons[0]}"]].assign(series="Actual (Today's Mean Temp)").rename(columns={f"y_true_{target_cols_all_horizons[0]}": "temperature"}))
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', '')
# Removed: plot_df_list_bare.append(train_df[["date", target_col]].assign(series=f"Train (t2m_{horizon_name})").rename(columns={target_col: "temperature"}))
plot_df_list_bare.append(df_eval_bare[["date", f"y_true_{target_col}"]].assign(series=f"Test Actual ({horizon_name})").rename(columns={f"y_true_{target_col}": "temperature"}))
plot_df_list_bare.append(df_eval_bare[["date", f"y_pred_{target_col}"]].assign(series=f"Test Pred ({horizon_name})").rename(columns={f"y_pred_{target_col}": "temperature"}))
plot_df_bare = pd.concat(plot_df_list_bare, ignore_index=True)
fig_bare = px.line(plot_df_bare, x="date", y="temperature", color="series",
title=f"XGBoost Bare Bones Baseline for {target_city_name} - T2M Forecasts",
template="plotly_white", color_discrete_map=color_map_plot, line_dash_map=line_dash_map_plot)
fig_bare.update_layout(
height=600,
xaxis_title="Date",
yaxis_title="T2M Mean Temp (°C)",
legend=dict(
x=1, # x=1 is the far right
y=1, # y=1 is the top
xanchor="right",
yanchor="top",
bgcolor="rgba(255,255,255,0.7)", # semi-transparent white background
bordercolor="black",
borderwidth=1
)
)
fig_bare.show()
# --- SECTION 6.1 Plot differences (Bare Bones) ---
plot_error_df_list_bare = []
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
error_col = f"error_{horizon_name}"
df_eval_bare[error_col] = df_eval_bare[f"y_true_{target_col}"] - df_eval_bare[f"y_pred_{target_col}"]
plot_error_df_list_bare.append(df_eval_bare[["date", error_col]].assign(series=f"Error ({horizon_name})").rename(columns={error_col: "error"}))
plot_error_df_bare = pd.concat(plot_error_df_list_bare, ignore_index=True)
fig_error_bare = px.line(plot_error_df_bare, x="date", y="error", color="series",
title=f"Forecast Error for {target_city_name} (Test Set - Bare Bones)",
labels={"error": "Prediction Error (°C)", "date": "Date"},
template="plotly_white", color_discrete_map=error_color_map_plot)
fig_error_bare.add_hline(y=0, line_dash="dash", line_color="grey", annotation_text="Zero Error")
fig_error_bare.update_layout(height=600)
fig_error_bare.show()
# --- SECTION 6.2 Feature Importance (Bare Bones) ---
for i, target_col in enumerate(target_cols_all_horizons): # MODIFIED: Loop through all target horizons
# Get the specific XGBoost estimator for this target
model_for_importance = models_bare["multi_output_model"].estimators_[i]
feature_importances = model_for_importance.feature_importances_
importance_df_bare = pd.DataFrame({
'Feature': bare_bones_feature_cols, # Use the bare_bones_feature_cols defined earlier
'Importance': feature_importances
})
importance_df_bare = importance_df_bare.sort_values(by='Importance', ascending=False)
# MODIFIED: Print top 20 features for the current target_col
#print(f"\n--- Top 20 Feature Importances for Bare Bones Model (Target: {target_col}) ---")
#print(importance_df_bare.head(20))
# MODIFIED: Plotting the top 20 feature importances for the current target_col
plt.figure(figsize=(12, 8))
sns.barplot(x='Importance', y='Feature', data=importance_df_bare.head(20), palette='viridis')
plt.title(f'Top 20 Feature Importances for Bare Bones Model (Target: {target_col})') # Adjusted title
plt.xlabel('Importance (F-score or Gain)')
plt.ylabel('Feature')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
--- Training Bare Bones Model for targets: ['t2m_1_day_later', 't2m_2_days_later', 't2m_3_days_later', 't2m_4_days_later', 't2m_5_days_later'] ---
--- Metrics for Bare Bones Model (Test Set) ---
| Model | Horizon | RMSE | MAE | R2 | Max_Error | |
|---|---|---|---|---|---|---|
| 0 | Bare Bones | 1_day day(s) | 2.493334 | 1.860813 | 0.912683 | 9.279320 |
| 1 | Bare Bones | 2_days day(s) | 3.437763 | 2.617557 | 0.834063 | 12.168999 |
| 2 | Bare Bones | 3_days day(s) | 3.816496 | 2.905861 | 0.795535 | 13.472501 |
| 3 | Bare Bones | 4_days day(s) | 3.854804 | 2.964682 | 0.791518 | 13.387121 |
| 4 | Bare Bones | 5_days day(s) | 3.968461 | 3.026663 | 0.778850 | 14.513976 |





4. Model 2: Full Features (Non-Cascading - from 03.621)
This model utilizes all engineered features (derived, intra-city lags, cross-city from all 9 cities, and rolling windows) but trains independent models for each forecast horizon (via MultiOutputRegressor).
Show code
from xgboost import XGBRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, max_error
# --- SECTION 4. Train XGBoost Model (Full Features, Non-Cascading) ---
iX_train_full_noncascading = train_df[full_feature_set_cols]
iy_train_full_noncascading = train_df[target_cols_all_horizons]
xgb_base_model_full_noncascading = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=0)
multi_output_model_full_noncascading = MultiOutputRegressor(estimator=xgb_base_model_full_noncascading, n_jobs=-1)
print(f"\n--- Training Full Features (Non-Cascading) Model for targets: {target_cols_all_horizons} ---")
multi_output_model_full_noncascading.fit(iX_train_full_noncascading, iy_train_full_noncascading)
models_full_noncascading = {"multi_output_model": multi_output_model_full_noncascading}
# --- SECTION 5. Forecast & Evaluate (Full Features, Non-Cascading) ---
df_eval_full_noncascading = pd.DataFrame({"date": test_df["date"]})
X_test_full_noncascading = test_df[full_feature_set_cols]
y_pred_all_targets_full_noncascading = models_full_noncascading["multi_output_model"].predict(X_test_full_noncascading)
current_model_results = []
for i, target_col in enumerate(target_cols_all_horizons):
df_eval_full_noncascading[f"y_true_{target_col}"] = test_df[target_col]
df_eval_full_noncascading[f"y_pred_{target_col}"] = y_pred_all_targets_full_noncascading[:, i]
y_true = df_eval_full_noncascading[f"y_true_{target_col}"]
y_pred = df_eval_full_noncascading[f"y_pred_{target_col}"]
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
max_err = max_error(y_true, y_pred)
current_model_results.append({
'Model': 'Full Features (Non-Cascading)',
'Horizon': target_col.replace('t2m_', '').replace('_later', ' day(s)'),
'RMSE': rmse,
'MAE': mae,
'R2': r2,
'Max_Error': max_err
})
all_model_comparison_results.extend(current_model_results)
print(f"\n--- Metrics for Full Features (Non-Cascading) Model (Test Set) ---")
display(pd.DataFrame(current_model_results))
# --- SECTION 6.0 Plot Results (Full Features, Non-Cascading) ---
# MODIFIED: Removed train data
plot_df_list_full_noncascading = []
# MODIFIED: Source Actual (Today's Mean Temp) from df_eval_full_noncascading for test period only
plot_df_list_full_noncascading.append(df_eval_full_noncascading[["date", f"y_true_{target_cols_all_horizons[0]}"]].assign(series="Actual (Today's Mean Temp)").rename(columns={f"y_true_{target_cols_all_horizons[0]}": "temperature"}))
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
# Removed: plot_df_list_full_noncascading.append(train_df[["date", target_col]].assign(series=f"Train (t2m_{horizon_name})").rename(columns={target_col: "temperature"}))
plot_df_list_full_noncascading.append(df_eval_full_noncascading[["date", f"y_true_{target_col}"]].assign(series=f"Test Actual ({horizon_name})").rename(columns={f"y_true_{target_col}": "temperature"}))
plot_df_list_full_noncascading.append(df_eval_full_noncascading[["date", f"y_pred_{target_col}"]].assign(series=f"Test Pred ({horizon_name})").rename(columns={f"y_pred_{target_col}": "temperature"}))
plot_df_full_noncascading = pd.concat(plot_df_list_full_noncascading, ignore_index=True)
fig_full_noncascading = px.line(plot_df_full_noncascading, x="date", y="temperature", color="series",
title=f"XGBoost Full Features (Non-Cascading) for {target_city_name} - T2M Forecasts",
template="plotly_white", color_discrete_map=color_map_plot, line_dash_map=line_dash_map_plot)
fig_full_noncascading.update_layout(height=600, xaxis_title="Date", yaxis_title="T2M Mean Temp (°C)")
fig_full_noncascading.show()
# --- SECTION 6.1 Plot differences (Full Features, Non-Cascading) ---
plot_error_df_list_full_noncascading = []
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
error_col = f"error_{horizon_name}"
df_eval_full_noncascading[error_col] = df_eval_full_noncascading[f"y_true_{target_col}"] - df_eval_full_noncascading[f"y_pred_{target_col}"]
plot_error_df_list_full_noncascading.append(df_eval_full_noncascading[["date", error_col]].assign(series=f"Error ({horizon_name})").rename(columns={error_col: "error"}))
plot_error_df_full_noncascading = pd.concat(plot_error_df_list_full_noncascading, ignore_index=True)
fig_error_full_noncascading = px.line(plot_error_df_full_noncascading, x="date", y="error", color="series",
title=f"Forecast Error for {target_city_name} (Test Set - Full Features, Non-Cascading)",
labels={"error": "Prediction Error (°C)", "date": "Date"},
template="plotly_white", color_discrete_map=error_color_map_plot)
fig_error_full_noncascading.add_hline(y=0, line_dash="dash", line_color="grey", annotation_text="Zero Error")
fig_error_full_noncascading.update_layout(height=600)
fig_error_full_noncascading.show()
# --- SECTION 6.2 Feature Importance (Full Features, Non-Cascading) ---
print(f"\n--- Feature Importances for Full Features (Non-Cascading) Model ---")
for i, target_col in enumerate(target_cols_all_horizons):
model_for_importance_full_noncascading = models_full_noncascading["multi_output_model"].estimators_[i]
feature_importances_full_noncascading = model_for_importance_full_noncascading.feature_importances_
importance_df_full_noncascading = pd.DataFrame({'Feature': full_feature_set_cols, 'Importance': feature_importances_full_noncascading})
importance_df_full_noncascading = importance_df_full_noncascading.sort_values(by='Importance', ascending=False)
print(f"\nFeature Importances for {target_col} (Full Features, Non-Cascading):")
print(importance_df_full_noncascading.head(20))
plt.figure(figsize=(12, 8))
sns.barplot(x='Importance', y='Feature', data=importance_df_full_noncascading.head(20), palette='viridis')
plt.title(f'Top 20 Feature Importances for {target_col} (Full Features, Non-Cascading)')
plt.xlabel('Importance (F-score or Gain)')
plt.ylabel('Feature')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
--- Training Full Features (Non-Cascading) Model for targets: ['t2m_1_day_later', 't2m_2_days_later', 't2m_3_days_later', 't2m_4_days_later', 't2m_5_days_later'] ---
--- Metrics for Full Features (Non-Cascading) Model (Test Set) ---
| Model | Horizon | RMSE | MAE | R2 | Max_Error | |
|---|---|---|---|---|---|---|
| 0 | Full Features (Non-Cascading) | 1_day day(s) | 2.011463 | 1.489929 | 0.943172 | 9.373296 |
| 1 | Full Features (Non-Cascading) | 2_days day(s) | 3.084712 | 2.352502 | 0.866396 | 11.284673 |
| 2 | Full Features (Non-Cascading) | 3_days day(s) | 3.617922 | 2.787716 | 0.816258 | 13.461348 |
| 3 | Full Features (Non-Cascading) | 4_days day(s) | 3.716141 | 2.879343 | 0.806247 | 13.819242 |
| 4 | Full Features (Non-Cascading) | 5_days day(s) | 3.929810 | 3.053671 | 0.783137 | 14.496209 |
--- Feature Importances for Full Features (Non-Cascading) Model ---
Feature Importances for t2m_1_day_later (Full Features, Non-Cascading):
Feature Importance
18 t2m_avg_from_max_min 0.727284
6 t2m_max 0.076977
4 t2m 0.046108
3 doy_cos 0.004483
15 prectotcorr 0.003251
11 wd10m 0.003026
20 wind_u 0.002754
184 winchester_t2m_avg_from_max_min_lag_1 0.002551
10 ws10m 0.002284
217 fort payne_t2m_lag_2 0.002019
19 ps_change_24hr 0.001876
174 jasper_t2m_avg_from_max_min_lag_1 0.001668
127 t2m_avg_from_max_min_lag_7 0.001659
12 ps 0.001614
22 t2m_range_normalized 0.001604
5 t2m_min 0.001582
21 wind_v 0.001524
124 t2m_avg_from_max_min_lag_4 0.001448
13 allsky_sw_dwn 0.001372
207 athens_t2m_lag_2 0.001315

Feature Importances for t2m_2_days_later (Full Features, Non-Cascading):
Feature Importance
6 t2m_max 0.461344
18 t2m_avg_from_max_min 0.119644
4 t2m 0.054304
3 doy_cos 0.028003
264 t2m_avg_from_max_min_roll_mean_7d 0.011241
248 t2m_roll_mean_7d 0.008950
175 jasper_t2m_avg_from_max_min_lag_2 0.008186
127 t2m_avg_from_max_min_lag_7 0.007067
174 jasper_t2m_avg_from_max_min_lag_1 0.005733
12 ps 0.005435
123 t2m_avg_from_max_min_lag_3 0.004674
168 jasper_ps_lag_1 0.004597
166 jasper_t2m_lag_1 0.004152
246 t2m_roll_mean_3d 0.003866
10 ws10m 0.003383
158 scottsboro_ps_lag_1 0.003150
15 prectotcorr 0.003064
2 doy_sin 0.003025
227 dayton_t2m_lag_2 0.002885
20 wind_u 0.002863

Feature Importances for t2m_3_days_later (Full Features, Non-Cascading):
Feature Importance
3 doy_cos 0.289576
248 t2m_roll_mean_7d 0.230206
4 t2m 0.032064
6 t2m_max 0.027875
18 t2m_avg_from_max_min 0.024910
264 t2m_avg_from_max_min_roll_mean_7d 0.013331
246 t2m_roll_mean_3d 0.012552
2 doy_sin 0.007931
262 t2m_avg_from_max_min_roll_mean_3d 0.007714
177 winchester_t2m_lag_2 0.006007
12 ps 0.005631
29 t2m_lag_7 0.005385
127 t2m_avg_from_max_min_lag_7 0.005090
1 day_of_year 0.004884
226 dayton_t2m_lag_1 0.004339
126 t2m_avg_from_max_min_lag_6 0.004036
252 dewpoint_roll_mean_7d 0.003414
227 dayton_t2m_lag_2 0.003358
168 jasper_ps_lag_1 0.003138
28 t2m_lag_6 0.003057

Feature Importances for t2m_4_days_later (Full Features, Non-Cascading):
Feature Importance
3 doy_cos 0.308031
248 t2m_roll_mean_7d 0.134334
264 t2m_avg_from_max_min_roll_mean_7d 0.034468
262 t2m_avg_from_max_min_roll_mean_3d 0.023982
18 t2m_avg_from_max_min 0.017123
122 t2m_avg_from_max_min_lag_2 0.016080
6 t2m_max 0.014836
4 t2m 0.009422
2 doy_sin 0.007194
185 winchester_t2m_avg_from_max_min_lag_2 0.007005
1 day_of_year 0.006551
224 fort payne_t2m_avg_from_max_min_lag_1 0.004471
119 t2m_dewpoint_spread_lag_6 0.004425
12 ps 0.004082
256 ps_roll_mean_7d 0.004043
217 fort payne_t2m_lag_2 0.003997
126 t2m_avg_from_max_min_lag_6 0.003858
218 fort payne_ps_lag_1 0.003834
179 winchester_ps_lag_2 0.003741
28 t2m_lag_6 0.003275

Feature Importances for t2m_5_days_later (Full Features, Non-Cascading):
Feature Importance
3 doy_cos 0.304646
248 t2m_roll_mean_7d 0.159524
262 t2m_avg_from_max_min_roll_mean_3d 0.032932
125 t2m_avg_from_max_min_lag_5 0.009109
6 t2m_max 0.008415
184 winchester_t2m_avg_from_max_min_lag_1 0.007402
215 athens_t2m_avg_from_max_min_lag_2 0.006810
2 doy_sin 0.006574
1 day_of_year 0.006352
246 t2m_roll_mean_3d 0.005623
121 t2m_avg_from_max_min_lag_1 0.005614
124 t2m_avg_from_max_min_lag_4 0.005467
218 fort payne_ps_lag_1 0.004774
264 t2m_avg_from_max_min_roll_mean_7d 0.004419
252 dewpoint_roll_mean_7d 0.004118
26 t2m_lag_4 0.004058
256 ps_roll_mean_7d 0.004041
18 t2m_avg_from_max_min 0.004028
214 athens_t2m_avg_from_max_min_lag_1 0.003999
235 dayton_t2m_avg_from_max_min_lag_2 0.003770

5. Model 3: Cascading Forecasts (03.650)
This model builds on the full feature set (derived, intra-city lags, cross-city from all 9 cities, and rolling windows) but uses a sequential training strategy, where predictions from earlier horizons are fed as features to later horizons.
Show code
from xgboost import XGBRegressor
from sklearn.base import clone
# --- SECTION 4. Train Cascading XGBoost Models ---
models_cascading = {}
train_pred_features_df_cascading = pd.DataFrame(index=train_df.index)
test_pred_features_df_cascading = pd.DataFrame(index=test_df.index)
xgb_base_model_cascading = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=0)
current_feature_cols_cascading_training = list(full_feature_set_cols)
for i, target_col in enumerate(target_cols_all_horizons):
print(f"\n--- Training Cascading Model for {target_col} ---")
iX_train_cascading = train_df[current_feature_cols_cascading_training]
iy_train_cascading = train_df[target_col]
model_cascading = clone(xgb_base_model_cascading)
print(f"Features used for {target_col}: {len(current_feature_cols_cascading_training)} features")
model_cascading.fit(iX_train_cascading, iy_train_cascading)
models_cascading[target_col] = model_cascading
train_preds_cascading = model_cascading.predict(train_df[current_feature_cols_cascading_training])
new_pred_feature_name = f"predicted_{target_col}"
train_pred_features_df_cascading[new_pred_feature_name] = train_preds_cascading
test_preds_cascading = model_cascading.predict(test_df[current_feature_cols_cascading_training])
test_pred_features_df_cascading[new_pred_feature_name] = test_preds_cascading
if i < len(target_cols_all_horizons) - 1:
current_feature_cols_cascading_training.append(new_pred_feature_name)
train_df[new_pred_feature_name] = train_pred_features_df_cascading[new_pred_feature_name]
test_df[new_pred_feature_name] = test_pred_features_df_cascading[new_pred_feature_name]
print("\nAll cascading models trained.")
# --- SECTION 5. Forecast & Evaluate (Cascading) ---
df_eval_cascading = pd.DataFrame({"date": test_df["date"]})
current_model_results = []
for target_col in target_cols_all_horizons:
df_eval_cascading[f"y_true_{target_col}"] = test_df[target_col]
df_eval_cascading[f"y_pred_{target_col}"] = test_pred_features_df_cascading[f"predicted_{target_col}"]
y_true = df_eval_cascading[f"y_true_{target_col}"]
y_pred = df_eval_cascading[f"y_pred_{target_col}"]
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
max_err = max_error(y_true, y_pred)
current_model_results.append({
'Model': 'Cascading',
'Horizon': target_col.replace('t2m_', '').replace('_later', ' day(s)'),
'RMSE': rmse,
'MAE': mae,
'R2': r2,
'Max_Error': max_err
})
all_model_comparison_results.extend(current_model_results)
print(f"\n--- Metrics for Cascading Model (Test Set) ---")
display(pd.DataFrame(current_model_results))
# --- SECTION 6.0 Plot Results (Cascading) ---
# MODIFIED: Removed train data
plot_df_list_cascading = []
# MODIFIED: Source Actual (Today's Mean Temp) from df_eval_cascading for test period only
plot_df_list_cascading.append(df_eval_cascading[["date", f"y_true_{target_cols_all_horizons[0]}"]].assign(series="Actual (Today's Mean Temp)").rename(columns={f"y_true_{target_cols_all_horizons[0]}": "temperature"}))
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
# Removed: plot_df_list_cascading.append(train_df[["date", target_col]].assign(series=f"Train (t2m_{horizon_name})").rename(columns={target_col: "temperature"}))
plot_df_list_cascading.append(df_eval_cascading[["date", f"y_true_{target_col}"]].assign(series=f"Test Actual ({horizon_name})").rename(columns={f"y_true_{target_col}": "temperature"}))
plot_df_list_cascading.append(df_eval_cascading[["date", f"y_pred_{target_col}"]].assign(series=f"Test Pred ({horizon_name})").rename(columns={f"y_pred_{target_col}": "temperature"}))
plot_df_cascading = pd.concat(plot_df_list_cascading, ignore_index=True)
fig_cascading = px.line(plot_df_cascading, x="date", y="temperature", color="series",
title=f"XGBoost Cascading Forecasts for {target_city_name} - T2M Forecasts",
template="plotly_white", color_discrete_map=color_map_plot, line_dash_map=line_dash_map_plot)
fig_cascading.update_layout(height=600, xaxis_title="Date", yaxis_title="T2M Mean Temp (°C)")
fig_cascading.show()
# --- SECTION 6.1 Plot differences (Cascading) ---
plot_error_df_list_cascading = []
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
error_col = f"error_{horizon_name}"
df_eval_cascading[error_col] = df_eval_cascading[f"y_true_{target_col}"] - df_eval_cascading[f"y_pred_{target_col}"]
plot_error_df_list_cascading.append(df_eval_cascading[["date", error_col]].assign(series=f"Error ({horizon_name})").rename(columns={error_col: "error"}))
plot_error_df_cascading = pd.concat(plot_error_df_list_cascading, ignore_index=True)
fig_error_cascading = px.line(plot_error_df_cascading, x="date", y="error", color="series",
title=f"Forecast Error for {target_city_name} (Test Set - Cascading)",
labels={"error": "Prediction Error (°C)", "date": "Date"},
template="plotly_white", color_discrete_map=error_color_map_plot)
fig_error_cascading.add_hline(y=0, line_dash="dash", line_color="grey", annotation_text="Zero Error")
fig_error_cascading.update_layout(height=600)
fig_error_cascading.show()
# --- SECTION 6.2 Feature Importance (Cascading) ---
print(f"\n--- Feature Importances for Cascading Models ---")
for i, target_col in enumerate(target_cols_all_horizons):
model_for_importance_cascading = models_cascading[target_col]
feature_importances_cascading = model_for_importance_cascading.feature_importances_
model_specific_feature_cols_cascading = list(full_feature_set_cols)
for prev_i in range(i):
model_specific_feature_cols_cascading.append(f"predicted_{target_cols_all_horizons[prev_i]}")
importance_df_cascading = pd.DataFrame({'Feature': model_specific_feature_cols_cascading, 'Importance': feature_importances_cascading})
importance_df_cascading = importance_df_cascading.sort_values(by='Importance', ascending=False)
print(f"\nFeature Importances for {target_col} (Cascading):")
print(importance_df_cascading.head(20))
plt.figure(figsize=(12, 8))
sns.barplot(x='Importance', y='Feature', data=importance_df_cascading.head(20), palette='viridis')
plt.title(f'Top 20 Feature Importances for {target_col} (Cascading)')
plt.xlabel('Importance (F-score or Gain)')
plt.ylabel('Feature')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
--- Training Cascading Model for t2m_1_day_later ---
Features used for t2m_1_day_later: 266 features
--- Training Cascading Model for t2m_2_days_later ---
Features used for t2m_2_days_later: 267 features
--- Training Cascading Model for t2m_3_days_later ---
Features used for t2m_3_days_later: 268 features
--- Training Cascading Model for t2m_4_days_later ---
Features used for t2m_4_days_later: 269 features
--- Training Cascading Model for t2m_5_days_later ---
Features used for t2m_5_days_later: 270 features
All cascading models trained.
--- Metrics for Cascading Model (Test Set) ---
| Model | Horizon | RMSE | MAE | R2 | Max_Error | |
|---|---|---|---|---|---|---|
| 0 | Cascading | 1_day day(s) | 2.011463 | 1.489929 | 0.943172 | 9.373296 |
| 1 | Cascading | 2_days day(s) | 3.142463 | 2.382870 | 0.861346 | 12.095691 |
| 2 | Cascading | 3_days day(s) | 3.671489 | 2.797543 | 0.810777 | 14.031865 |
| 3 | Cascading | 4_days day(s) | 3.939334 | 3.025852 | 0.782275 | 13.395379 |
| 4 | Cascading | 5_days day(s) | 4.013391 | 3.094258 | 0.773814 | 13.291235 |
--- Feature Importances for Cascading Models ---
Feature Importances for t2m_1_day_later (Cascading):
Feature Importance
18 t2m_avg_from_max_min 0.727284
6 t2m_max 0.076977
4 t2m 0.046108
3 doy_cos 0.004483
15 prectotcorr 0.003251
11 wd10m 0.003026
20 wind_u 0.002754
184 winchester_t2m_avg_from_max_min_lag_1 0.002551
10 ws10m 0.002284
217 fort payne_t2m_lag_2 0.002019
19 ps_change_24hr 0.001876
174 jasper_t2m_avg_from_max_min_lag_1 0.001668
127 t2m_avg_from_max_min_lag_7 0.001659
12 ps 0.001614
22 t2m_range_normalized 0.001604
5 t2m_min 0.001582
21 wind_v 0.001524
124 t2m_avg_from_max_min_lag_4 0.001448
13 allsky_sw_dwn 0.001372
207 athens_t2m_lag_2 0.001315

Feature Importances for t2m_2_days_later (Cascading):
Feature Importance
266 predicted_t2m_1_day_later 0.435157
3 doy_cos 0.011623
21 wind_v 0.010863
12 ps 0.008163
177 winchester_t2m_lag_2 0.007976
51 dewpoint_lag_1 0.006614
8 dewpoint 0.006485
5 t2m_min 0.005716
18 t2m_avg_from_max_min 0.005442
207 athens_t2m_lag_2 0.005159
17 t2m_dewpoint_spread 0.004861
14 allsky_lw_dwn 0.004619
122 t2m_avg_from_max_min_lag_2 0.004476
9 rh2m 0.004461
22 t2m_range_normalized 0.004362
214 athens_t2m_avg_from_max_min_lag_1 0.004241
19 ps_change_24hr 0.004136
127 t2m_avg_from_max_min_lag_7 0.004087
119 t2m_dewpoint_spread_lag_6 0.004034
222 fort payne_wind_v_lag_1 0.004012

Feature Importances for t2m_3_days_later (Cascading):
Feature Importance
267 predicted_t2m_2_days_later 0.411059
3 doy_cos 0.008865
266 predicted_t2m_1_day_later 0.008116
227 dayton_t2m_lag_2 0.007661
166 jasper_t2m_lag_1 0.006897
168 jasper_ps_lag_1 0.005644
246 t2m_roll_mean_3d 0.005621
167 jasper_t2m_lag_2 0.005546
217 fort payne_t2m_lag_2 0.005324
224 fort payne_t2m_avg_from_max_min_lag_1 0.005315
214 athens_t2m_avg_from_max_min_lag_1 0.005199
235 dayton_t2m_avg_from_max_min_lag_2 0.005177
177 winchester_t2m_lag_2 0.005113
110 t2m_ratio_to_t2m_min_lag_4 0.004836
127 t2m_avg_from_max_min_lag_7 0.004367
38 t2m_max_lag_2 0.004304
225 fort payne_t2m_avg_from_max_min_lag_2 0.004264
185 winchester_t2m_avg_from_max_min_lag_2 0.003921
176 winchester_t2m_lag_1 0.003894
219 fort payne_ps_lag_2 0.003882

Feature Importances for t2m_4_days_later (Cascading):
Feature Importance
268 predicted_t2m_3_days_later 0.399316
3 doy_cos 0.011178
234 dayton_t2m_avg_from_max_min_lag_1 0.008493
167 jasper_t2m_lag_2 0.008037
262 t2m_avg_from_max_min_roll_mean_3d 0.007185
267 predicted_t2m_2_days_later 0.006732
18 t2m_avg_from_max_min 0.006692
185 winchester_t2m_avg_from_max_min_lag_2 0.006123
246 t2m_roll_mean_3d 0.005963
176 winchester_t2m_lag_1 0.005246
215 athens_t2m_avg_from_max_min_lag_2 0.004772
122 t2m_avg_from_max_min_lag_2 0.004666
266 predicted_t2m_1_day_later 0.004643
126 t2m_avg_from_max_min_lag_6 0.004401
224 fort payne_t2m_avg_from_max_min_lag_1 0.004004
169 jasper_ps_lag_2 0.003795
125 t2m_avg_from_max_min_lag_5 0.003767
264 t2m_avg_from_max_min_roll_mean_7d 0.003662
33 t2m_min_lag_4 0.003649
89 allsky_sw_dwn_lag_4 0.003552

Feature Importances for t2m_5_days_later (Cascading):
Feature Importance
269 predicted_t2m_4_days_later 0.405377
3 doy_cos 0.015155
248 t2m_roll_mean_7d 0.010913
268 predicted_t2m_3_days_later 0.007352
207 athens_t2m_lag_2 0.006097
267 predicted_t2m_2_days_later 0.006063
122 t2m_avg_from_max_min_lag_2 0.005713
126 t2m_avg_from_max_min_lag_6 0.005179
266 predicted_t2m_1_day_later 0.004981
184 winchester_t2m_avg_from_max_min_lag_1 0.004843
235 dayton_t2m_avg_from_max_min_lag_2 0.004669
27 t2m_lag_5 0.004618
252 dewpoint_roll_mean_7d 0.004489
18 t2m_avg_from_max_min 0.004313
53 dewpoint_lag_3 0.004147
42 t2m_max_lag_6 0.004098
216 fort payne_t2m_lag_1 0.003874
167 jasper_t2m_lag_2 0.003843
262 t2m_avg_from_max_min_roll_mean_3d 0.003829
157 scottsboro_t2m_lag_2 0.003787

6. Model 4: Native Multi-Output (03.730)
This model uses the full feature set with XGBoost’s native multi-output tree strategy, aiming to learn interdependencies between all 5 horizons in a single, unified model.
Show code
from xgboost import XGBRegressor
from xgboost import XGBRegressor
# --- SECTION 4. Train XGBoost Model (Native Multi-Output) ---
iX_train_native = train_df[full_feature_set_cols]
iy_train_native = train_df[target_cols_all_horizons]
xgb_native_multi_output_model = XGBRegressor(
n_estimators=100,
learning_rate=0.1,
random_state=0,
tree_method="hist",
multi_strategy="multi_output_tree"
)
print(f"\n--- Training Native Multi-Output Model for targets: {target_cols_all_horizons} ---")
xgb_native_multi_output_model.fit(iX_train_native, iy_train_native)
models_native_multioutput = {"multi_output_model": xgb_native_multi_output_model}
# --- SECTION 5. Forecast & Evaluate (Native Multi-Output) ---
df_eval_native_multioutput = pd.DataFrame({"date": test_df["date"]})
X_test_native = test_df[full_feature_set_cols]
y_pred_all_targets_native = models_native_multioutput["multi_output_model"].predict(X_test_native)
current_model_results = []
for i, target_col in enumerate(target_cols_all_horizons):
df_eval_native_multioutput[f"y_true_{target_col}"] = test_df[target_col]
df_eval_native_multioutput[f"y_pred_{target_col}"] = y_pred_all_targets_native[:, i]
y_true = df_eval_native_multioutput[f"y_true_{target_col}"]
y_pred = df_eval_native_multioutput[f"y_pred_{target_col}"]
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
max_err = max_error(y_true, y_pred)
print(f"\nMetrics for {target_col} (Native Multi-Output):")
print(f" RMSE: {rmse:.3f}, MAE: {mae:.3f}, R2: {r2:.3f}, Max Error: {max_err:.3f}")
current_model_results.append({
'Model': 'Native Multi-Output',
'Horizon': target_col.replace('t2m_', '').replace('_later', ' day(s)'),
'RMSE': rmse,
'MAE': mae,
'R2': r2,
'Max_Error': max_err
})
all_model_comparison_results.extend(current_model_results)
print(f"\n--- Metrics for Native Multi-Output Model (Test Set) ---")
display(pd.DataFrame(current_model_results))
# --- SECTION 6.0 Plot Results (Native Multi-Output) ---
plot_df_list_native = []
plot_df_list_native.append(df_final[["date", "t2m"]].assign(series="Actual (Today's Mean Temp)").rename(columns={"t2m": "temperature"}))
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
plot_df_list_native.append(train_df[["date", target_col]].assign(series=f"Train (t2m_{horizon_name})").rename(columns={target_col: "temperature"}))
plot_df_list_native.append(df_eval_native_multioutput[["date", f"y_true_{target_col}"]].assign(series=f"Test Actual ({horizon_name})").rename(columns={f"y_true_{target_col}": "temperature"}))
plot_df_list_native.append(df_eval_native_multioutput[["date", f"y_pred_{target_col}"]].assign(series=f"Test Pred ({horizon_name})").rename(columns={f"y_pred_{target_col}": "temperature"}))
plot_df_native = pd.concat(plot_df_list_native, ignore_index=True)
fig_native = px.line(plot_df_native, x="date", y="temperature", color="series",
title=f"XGBoost Native Multi-Output for {target_city_name} - T2M Forecasts",
template="plotly_white", color_discrete_map=color_map_plot, line_dash_map=line_dash_map_plot)
fig_native.update_layout(height=600, xaxis_title="Date", yaxis_title="T2M Mean Temp (°C)")
fig_native.show()
# --- SECTION 6.1 Plot differences (Native Multi-Output) ---
plot_error_df_list_native = []
for target_col in target_cols_all_horizons:
horizon_name = target_col.replace('t2m_', '').replace('_later', ' day(s)')
error_col = f"error_{horizon_name}"
df_eval_native_multioutput[error_col] = df_eval_native_multioutput[f"y_true_{target_col}"] - df_eval_native_multioutput[f"y_pred_{target_col}"]
plot_error_df_list_native.append(df_eval_native_multioutput[["date", error_col]].assign(series=f"Error ({horizon_name})").rename(columns={error_col: "error"}))
plot_error_df_native = pd.concat(plot_error_df_list_native, ignore_index=True)
fig_error_native = px.line(plot_error_df_native, x="date", y="error", color="series",
title=f"Forecast Error for {target_city_name} (Test Set - Native Multi-Output)",
labels={"error": "Prediction Error (°C)", "date": "Date"},
template="plotly_white", color_discrete_map=error_color_map_plot)
fig_error_native.add_hline(y=0, line_dash="dash", line_color="grey", annotation_text="Zero Error")
fig_error_native.update_layout(height=600)
fig_error_native.show()
# --- SECTION 6.2 Feature Importance (Native Multi-Output) ---
print(f"\n--- Feature Importances for Native Multi-Output Model ---")
# This section remains commented out as native multi-output tree doesn't directly support feature_importances_
print("NOTE: Feature importances for native multi-output trees (multi_strategy='multi_output_tree')")
print("are not directly available via .feature_importances_ in current XGBoost versions (due to internal gain calculation limitations).")
--- Training Native Multi-Output Model for targets: ['t2m_1_day_later', 't2m_2_days_later', 't2m_3_days_later', 't2m_4_days_later', 't2m_5_days_later'] ---
Metrics for t2m_1_day_later (Native Multi-Output):
RMSE: 2.092, MAE: 1.592, R2: 0.939, Max Error: 9.345
Metrics for t2m_2_days_later (Native Multi-Output):
RMSE: 3.080, MAE: 2.364, R2: 0.867, Max Error: 11.512
Metrics for t2m_3_days_later (Native Multi-Output):
RMSE: 3.527, MAE: 2.716, R2: 0.825, Max Error: 12.744
Metrics for t2m_4_days_later (Native Multi-Output):
RMSE: 3.725, MAE: 2.874, R2: 0.805, Max Error: 13.645
Metrics for t2m_5_days_later (Native Multi-Output):
RMSE: 3.814, MAE: 2.927, R2: 0.796, Max Error: 15.915
--- Metrics for Native Multi-Output Model (Test Set) ---
| Model | Horizon | RMSE | MAE | R2 | Max_Error | |
|---|---|---|---|---|---|---|
| 0 | Native Multi-Output | 1_day day(s) | 2.092171 | 1.592071 | 0.938520 | 9.345047 |
| 1 | Native Multi-Output | 2_days day(s) | 3.079716 | 2.364106 | 0.866828 | 11.512363 |
| 2 | Native Multi-Output | 3_days day(s) | 3.526775 | 2.715950 | 0.825400 | 12.744113 |
| 3 | Native Multi-Output | 4_days day(s) | 3.725496 | 2.874296 | 0.805271 | 13.645490 |
| 4 | Native Multi-Output | 5_days day(s) | 3.814167 | 2.926581 | 0.795712 | 15.915308 |
--- Feature Importances for Native Multi-Output Model ---
NOTE: Feature importances for native multi-output trees (multi_strategy='multi_output_tree')
are not directly available via .feature_importances_ in current XGBoost versions (due to internal gain calculation limitations).
7. Overall Performance Summary and Trends
This section consolidates the results from all models for direct comparison.
Show code
import pandas as pd
import plotly.express as px
# Consolidate all results collected during model deep dives
df_all_model_results = pd.DataFrame(all_model_comparison_results)
print("\n--- Consolidated Model Performance Metrics ---")
print(df_all_model_results)
# --- Bar Chart Comparison (MAE) ---
fig_mae_compare = px.bar(
df_all_model_results,
x="Horizon",
y="MAE",
color="Model",
barmode="group",
title="MAE Comparison Across Models and Forecast Horizons",
labels={"MAE": "Mean Absolute Error (°C)"},
template="plotly_white"
)
fig_mae_compare.update_layout(height=500)
fig_mae_compare.show()
# --- Bar Chart Comparison (RMSE) ---
fig_rmse_compare = px.bar(
df_all_model_results,
x="Horizon",
y="RMSE",
color="Model",
barmode="group",
title="RMSE Comparison Across Models and Forecast Horizons",
labels={"RMSE": "Root Mean Squared Error (°C)"},
template="plotly_white"
)
fig_rmse_compare.update_layout(height=500)
fig_rmse_compare.show()
# --- Line Plot Comparison (MAE vs. Horizon) ---
fig_mae_line = px.line(
df_all_model_results,
x="Horizon",
y="MAE",
color="Model",
title="MAE Trend Across Forecast Horizons for Each Model",
labels={"MAE": "Mean Absolute Error (°C)"},
template="plotly_white",
markers=True
)
fig_mae_line.update_layout(height=500)
fig_mae_line.show()
# --- Line Plot Comparison (RMSE vs. Horizon) ---
fig_rmse_line = px.line(
df_all_model_results,
x="Horizon",
y="RMSE",
color="Model",
title="RMSE Trend Across Forecast Horizons for Each Model",
labels={"RMSE": "Root Mean Squared Error (°C)"},
template="plotly_white",
markers=True
)
fig_rmse_line.update_layout(height=500)
fig_rmse_line.show()
--- Consolidated Model Performance Metrics ---
Model Horizon RMSE MAE \
0 Bare Bones 1_day day(s) 2.493334 1.860813
1 Bare Bones 2_days day(s) 3.437763 2.617557
2 Bare Bones 3_days day(s) 3.816496 2.905861
3 Bare Bones 4_days day(s) 3.854804 2.964682
4 Bare Bones 5_days day(s) 3.968461 3.026663
5 Full Features (Non-Cascading) 1_day day(s) 2.011463 1.489929
6 Full Features (Non-Cascading) 2_days day(s) 3.084712 2.352502
7 Full Features (Non-Cascading) 3_days day(s) 3.617922 2.787716
8 Full Features (Non-Cascading) 4_days day(s) 3.716141 2.879343
9 Full Features (Non-Cascading) 5_days day(s) 3.929810 3.053671
10 Cascading 1_day day(s) 2.011463 1.489929
11 Cascading 2_days day(s) 3.142463 2.382870
12 Cascading 3_days day(s) 3.671489 2.797543
13 Cascading 4_days day(s) 3.939334 3.025852
14 Cascading 5_days day(s) 4.013391 3.094258
15 Native Multi-Output 1_day day(s) 2.092171 1.592071
16 Native Multi-Output 2_days day(s) 3.079716 2.364106
17 Native Multi-Output 3_days day(s) 3.526775 2.715950
18 Native Multi-Output 4_days day(s) 3.725496 2.874296
19 Native Multi-Output 5_days day(s) 3.814167 2.926581
R2 Max_Error
0 0.912683 9.279320
1 0.834063 12.168999
2 0.795535 13.472501
3 0.791518 13.387121
4 0.778850 14.513976
5 0.943172 9.373296
6 0.866396 11.284673
7 0.816258 13.461348
8 0.806247 13.819242
9 0.783137 14.496209
10 0.943172 9.373296
11 0.861346 12.095691
12 0.810777 14.031865
13 0.782275 13.395379
14 0.773814 13.291235
15 0.938520 9.345047
16 0.866828 11.512363
17 0.825400 12.744113
18 0.805271 13.645490
19 0.795712 15.915308
8. Conclusion and Future Work
Conclusion (Add your overall conclusions here based on the metrics and plots. Discuss which model performed best, why you think so, and the impact of different feature sets and strategies.)
Future Work (Suggest potential next steps, such as hyperparameter tuning for the best model, exploring other advanced features, trying different model types, or implementing true out-of-fold validation.)